{
  "cells": [
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/00-azure-openai-retrieval.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/langchain/00-azure-openai-retrieval.ipynb)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "AWGzucuFfbBn"
      },
      "source": [
        "# Using Azure's OpenAI with LangChain"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "id": "r-ryCeG_f_GC"
      },
      "outputs": [],
      "source": [
        "!pip install -qU \\\n",
        "    langchain==0.0.227 \\\n",
        "    openai==0.27.8 \\\n",
        "    \"pinecone-client[grpc]\"==3.1.0 \\\n",
        "    pinecone-datasets==0.7.0"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "QTOov1l53Nzs"
      },
      "source": [
        "## Building the Knowledge Base\n",
        "\n",
        "Adding an external knowledge to chatbots allows us to ground generation to this external knowledge. For our use-case our external knowledge will be the LangChain docs. We can load this from Pinecone datasets like so:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "pzoBAqUN5El_",
        "outputId": "0bd3f6ba-6f35-4a46-f0b2-8ebd96fcf71d"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "\n",
              "\n",
              "  <div id=\"df-8f658644-742d-4e6f-b688-29710288691a\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>id</th>\n",
              "      <th>values</th>\n",
              "      <th>metadata</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0.0</th>\n",
              "      <td>417ede5d-39be-498f-b518-f47ed4e53b90</td>\n",
              "      <td>[0.005949743557721376, 0.01983247883617878, -0...</td>\n",
              "      <td>{'chunk': 0, 'text': '.rst\n",
              ".pdf\n",
              "Welcome to Lan...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1.0</th>\n",
              "      <td>110f550d-110b-4378-b95e-141397fa21bc</td>\n",
              "      <td>[0.009401749819517136, 0.02443608082830906, 0....</td>\n",
              "      <td>{'chunk': 1, 'text': 'Use Cases#\n",
              "Best practice...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2.0</th>\n",
              "      <td>d5f00f02-3295-4567-b297-5e3262dc2728</td>\n",
              "      <td>[-0.005517194513231516, 0.0208403542637825, 0....</td>\n",
              "      <td>{'chunk': 2, 'text': 'Gallery: A collection of...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3.0</th>\n",
              "      <td>0b6fe3c6-1f0e-4608-a950-43231e46b08a</td>\n",
              "      <td>[-0.006499645300209522, 0.0011573900701478124,...</td>\n",
              "      <td>{'chunk': 0, 'text': 'Search\n",
              "Error\n",
              "Please acti...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4.0</th>\n",
              "      <td>39d5f15f-b973-42c0-8c9b-a2df49b627dc</td>\n",
              "      <td>[-0.005658374633640051, 0.00817849114537239, 0...</td>\n",
              "      <td>{'chunk': 0, 'text': '.md\n",
              ".pdf\n",
              "Dependents\n",
              "Depe...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-8f658644-742d-4e6f-b688-29710288691a')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "\n",
              "\n",
              "\n",
              "    <div id=\"df-9dc7fd91-952f-4102-a97d-f0003177be21\">\n",
              "      <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-9dc7fd91-952f-4102-a97d-f0003177be21')\"\n",
              "              title=\"Suggest charts.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "      </button>\n",
              "    </div>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "    background-color: #E8F0FE;\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: #1967D2;\n",
              "    height: 32px;\n",
              "    padding: 0 0 0 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: #E2EBFA;\n",
              "    box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: #174EA6;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "    background-color: #3B4455;\n",
              "    fill: #D2E3FC;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart:hover {\n",
              "    background-color: #434B5C;\n",
              "    box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "    filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "    fill: #FFFFFF;\n",
              "  }\n",
              "</style>\n",
              "\n",
              "    <script>\n",
              "      async function quickchart(key) {\n",
              "        const containerElement = document.querySelector('#' + key);\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      }\n",
              "    </script>\n",
              "\n",
              "      <script>\n",
              "\n",
              "function displayQuickchartButton(domScope) {\n",
              "  let quickchartButtonEl =\n",
              "    domScope.querySelector('#df-9dc7fd91-952f-4102-a97d-f0003177be21 button.colab-df-quickchart');\n",
              "  quickchartButtonEl.style.display =\n",
              "    google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "}\n",
              "\n",
              "        displayQuickchartButton(document);\n",
              "      </script>\n",
              "      <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-8f658644-742d-4e6f-b688-29710288691a button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-8f658644-742d-4e6f-b688-29710288691a');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "text/plain": [
              "                                       id  \\\n",
              "0.0  417ede5d-39be-498f-b518-f47ed4e53b90   \n",
              "1.0  110f550d-110b-4378-b95e-141397fa21bc   \n",
              "2.0  d5f00f02-3295-4567-b297-5e3262dc2728   \n",
              "3.0  0b6fe3c6-1f0e-4608-a950-43231e46b08a   \n",
              "4.0  39d5f15f-b973-42c0-8c9b-a2df49b627dc   \n",
              "\n",
              "                                                values  \\\n",
              "0.0  [0.005949743557721376, 0.01983247883617878, -0...   \n",
              "1.0  [0.009401749819517136, 0.02443608082830906, 0....   \n",
              "2.0  [-0.005517194513231516, 0.0208403542637825, 0....   \n",
              "3.0  [-0.006499645300209522, 0.0011573900701478124,...   \n",
              "4.0  [-0.005658374633640051, 0.00817849114537239, 0...   \n",
              "\n",
              "                                              metadata  \n",
              "0.0  {'chunk': 0, 'text': '.rst\n",
              ".pdf\n",
              "Welcome to Lan...  \n",
              "1.0  {'chunk': 1, 'text': 'Use Cases#\n",
              "Best practice...  \n",
              "2.0  {'chunk': 2, 'text': 'Gallery: A collection of...  \n",
              "3.0  {'chunk': 0, 'text': 'Search\n",
              "Error\n",
              "Please acti...  \n",
              "4.0  {'chunk': 0, 'text': '.md\n",
              ".pdf\n",
              "Dependents\n",
              "Depe...  "
            ]
          },
          "execution_count": 2,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from pinecone_datasets import load_dataset\n",
        "\n",
        "dataset = load_dataset('langchain-python-docs-text-embedding-ada-002')\n",
        "# we drop sparse_values as they are not needed for this example\n",
        "dataset.documents.drop(['metadata', 'sparse_values'], axis=1, inplace=True)\n",
        "dataset.documents.rename(columns={'blob': 'metadata'}, inplace=True)\n",
        "dataset.head()"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "I-e33-oABWmB"
      },
      "source": [
        "We must change the `\"url\"` field in the **metadata** column to `\"source\"` for compatibility with later LangChain components."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "id": "aFss5U1RFl1U"
      },
      "outputs": [],
      "source": [
        "for i, row in dataset.documents.iterrows():\n",
        "    row['metadata']['source'] = row['metadata'].pop('url')"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "GMFaruC43NOU"
      },
      "source": [
        "Our input docs are ready so we can move onto indexing everything."
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "rJSTrhOZ51On"
      },
      "source": [
        "## Initializing the Index\n",
        "\n",
        "Now we need a place to store these embeddings and enable a efficient vector search through them all. To do that we use Pinecone, we can get a [free API key](https://app.pinecone.io/) and enter it below where we will initialize our connection to Pinecone and create a new index."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "id": "Ta9-67QN51oj"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "from pinecone import Pinecone\n",
        "\n",
        "# initialize connection to pinecone (get API key at app.pinecone.io)\n",
        "api_key = os.environ.get('PINECONE_API_KEY') or 'PINECONE_API_KEY'\n",
        "\n",
        "# configure client\n",
        "pc = Pinecone(api_key=api_key)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from pinecone import ServerlessSpec\n",
        "\n",
        "cloud = os.environ.get('PINECONE_CLOUD') or 'aws'\n",
        "region = os.environ.get('PINECONE_REGION') or 'us-east-1'\n",
        "\n",
        "spec = ServerlessSpec(cloud=cloud, region=region)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Create the index:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "id": "TKWZmg9l6Cuj"
      },
      "outputs": [],
      "source": [
        "index_name = 'azure-openai-langchain-intro'"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "jTSFmN-K6HV8",
        "outputId": "3680b637-a5c1-4cee-b22d-a8409bd0cefd"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'dimension': 1536,\n",
              " 'index_fullness': 0.0,\n",
              " 'namespaces': {},\n",
              " 'total_vector_count': 0}"
            ]
          },
          "execution_count": 6,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "import time\n",
        "\n",
        "# check if index already exists (it shouldn't if this is first time)\n",
        "if index_name not in pc.list_indexes().names():\n",
        "    # if does not exist, create index\n",
        "    pc.create_index(\n",
        "        index_name,\n",
        "        dimension=1536,  # dimensionality of text-embedding-ada-002\n",
        "        metric='cosine',\n",
        "        spec=spec\n",
        "    )\n",
        "    # wait for index to be initialized\n",
        "    while not pc.describe_index(index_name).status['ready']:\n",
        "        time.sleep(1)\n",
        "\n",
        "# connect to index\n",
        "index = pc.Index(index_name)\n",
        "# view index stats\n",
        "index.describe_index_stats()"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "ZPnVHRNt6eCd"
      },
      "source": [
        "Now we add all of our docs to Pinecone:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 98,
          "referenced_widgets": [
            "bb4b1deb4d324c459ecc58ff987923ac",
            "037cdcbedf9745099dec9c021217f91c",
            "9c272465c73c401daf0e323b49c145a8",
            "21aa8b33791e44f797cf4ea6862460b2",
            "e701b82c73db4a0abe8a50d6efa2454a",
            "22ea72881fc94694aa7e4511ac2cd528",
            "39fb48d1d26f4ce1a02e5a4854e46312",
            "0b4f718f21434759a777e49f7053b432",
            "14693a607c7042a98364f3f6d7c0cd0c",
            "3ec8c293047c438d91405f51cef54875",
            "442ac390a53c43e7b37c4ff9663a4565",
            "99fb6f80660d491c9c0b35b61756407a",
            "a4edbd751a0d497b98d3bd9af4f8444b",
            "790bcea9241e4c9bb2cdfe2177cdba52",
            "28c1711a93294f38b477cf46621d22d5",
            "938e8a4b843043cb8350fbe714a6d83e",
            "41d03cbe02284448b5792da6137da533",
            "253d5a11c0b948f29cc4cb528300295c",
            "a59445fb528b45dc9943545f24ae3543",
            "0e7d09a428d04caca8bd83f8f436621d",
            "90250c5b719c4c66b30ae2c68ffa0b49",
            "3d6aea5e4ac148e2942a9e99cc4db8a2"
          ]
        },
        "id": "rPkX2a-c6gXb",
        "outputId": "8ccbca1a-4336-4058-d852-473bd1e6aa3b"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "bb4b1deb4d324c459ecc58ff987923ac",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "sending upsert requests:   0%|          | 0/6952 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "99fb6f80660d491c9c0b35b61756407a",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "collecting async responses:   0%|          | 0/70 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/plain": [
              "upserted_count: 6952"
            ]
          },
          "execution_count": 7,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "index.upsert_from_dataframe(dataset.documents, batch_size=100)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "Pzoh2J5E6jJl"
      },
      "source": [
        "After indexing everything we can check the number of vectors in our index like so:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "gkM-ku8j6m5K",
        "outputId": "b21ec780-acf4-4ee9-e956-fdce486abb24"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'dimension': 1536,\n",
              " 'index_fullness': 0.0,\n",
              " 'namespaces': {'': {'vector_count': 3476}},\n",
              " 'total_vector_count': 3476}"
            ]
          },
          "execution_count": 8,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "index.describe_index_stats()"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "3quLsQzw9Jrb"
      },
      "source": [
        "## Initializing Azure OpenAI"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "0fOo9qQvDgkz"
      },
      "source": [
        "To use OpenAI's service via Azure we first need to setup the service in Azure and in **Azure OpenAI Studio** we need to create two *Deployments*, one using `gpt-4` and another using `text-embedding-ada-002`.\n",
        "\n",
        "Once we've done this we need to set a few environment variables (all found in **Azure OpenAI Studio**) like so:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "id": "deWmOJecfbBr"
      },
      "outputs": [],
      "source": [
        "os.environ['OPENAI_API_KEY'] = 'YOUR_OPENAI_API_KEY'\n",
        "os.environ['OPENAI_API_TYPE'] = 'azure'\n",
        "os.environ['OPENAI_API_VERSION'] = '2023-03-15-preview'\n",
        "os.environ['OPENAI_API_BASE'] = 'https://azure-pinecone-demo.openai.azure.com/'"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "2AWnaTCP0Ryg"
      },
      "source": [
        "We can now connect to both of our deployments via LangChain. First our `ChatCompletion` endpoint which uses `gpt-3.5-turbo`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "id": "ZhQSDoYe0ly4"
      },
      "outputs": [],
      "source": [
        "from langchain.chat_models import AzureChatOpenAI\n",
        "\n",
        "llm = AzureChatOpenAI(\n",
        "    deployment_name=\"gpt4\",\n",
        "    model_name=\"gpt-4\"\n",
        ")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "h4WOy6zCyAJF"
      },
      "source": [
        "And then our embedding endpoint which uses `text-embedding-ada-002`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "id": "j10WYSvbyGv8"
      },
      "outputs": [],
      "source": [
        "from langchain.embeddings import OpenAIEmbeddings\n",
        "\n",
        "embed = OpenAIEmbeddings(\n",
        "    deployment='embedding',\n",
        "    model='text-embedding-ada-002'\n",
        ")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "gaatTCay96VF"
      },
      "source": [
        "## Initializing Retrieval Component with LangChain"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "5hUPG7IG9e_8"
      },
      "source": [
        "Before we move on, we must also initialize a connection to our index via LangChain. We need this for compatibility with later LangChain components. To use this we pass the `index` from above into a LangChain `vectorstores.Pinecone` object:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "id": "40zQcOJh9yvh"
      },
      "outputs": [],
      "source": [
        "from langchain.vectorstores import Pinecone\n",
        "\n",
        "text_field = \"text\"\n",
        "\n",
        "# switch back to normal index for langchain\n",
        "index = pc.Index(index_name)\n",
        "\n",
        "vectorstore = Pinecone(\n",
        "    index, embed.embed_query, text_field\n",
        ")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "rJ_Xo5GJ9-ej"
      },
      "source": [
        "## Initializing the RetrievalQA Component"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "yS87i9wa_Ck4"
      },
      "source": [
        "The `RetrievalQA` and `RetrievalQAWithSourcesChain` are both components in LangChain that allow us to ask a natural language query and return a response grounded in the knowledge retrieved from our knowledge base. We can implement this and include original data sources like so:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "id": "ZgpuJlYa_VL6"
      },
      "outputs": [],
      "source": [
        "from langchain.chains import RetrievalQAWithSourcesChain\n",
        "\n",
        "qa = RetrievalQAWithSourcesChain.from_chain_type(\n",
        "    llm=llm,\n",
        "    chain_type=\"stuff\",\n",
        "    retriever=vectorstore.as_retriever()\n",
        ")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "aryiOoS8aLsz"
      },
      "source": [
        "Now we can begin asking questions about LangChain!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "ICFmRHCV_YiD",
        "outputId": "9dc06ba1-c655-49bf-f196-af864705c325"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'question': 'can you tell me about the PromptLayer for OpenAI in LangChain?',\n",
              " 'answer': 'PromptLayer for OpenAI in LangChain is a middleware that allows developers to track, manage, and share GPT prompt engineering. It records all OpenAI API requests, enabling users to search and explore request history in the PromptLayer dashboard. LangChain provides PromptLayer wrappers for LLM, PromptLayerChatOpenAI, and PromptLayerOpenAIChat. To use PromptLayer within LangChain, you need to install the promptlayer python library, create a PromptLayer account, and create an API token to set as an environment variable (PROMPTLAYER_API_KEY).\\n\\n',\n",
              " 'sources': '\\n- https://python.langchain.com/en/latest/integrations/promptlayer.html\\n- https://python.langchain.com/en/latest/modules/models/llms/integrations/promptlayer_openai.html\\n- https://python.langchain.com/en/latest/modules/models/chat/integrations/promptlayer_chatopenai.html'}"
            ]
          },
          "execution_count": 14,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "qa(\"can you tell me about the PromptLayer for OpenAI in LangChain?\")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "VoQFiqntaPya"
      },
      "source": [
        "We can format responses nicely like so:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 98
        },
        "id": "kmjmHp0I1pbe",
        "outputId": "a825795f-3e5c-4a75-8da9-d4468651bc29"
      },
      "outputs": [
        {
          "data": {
            "text/markdown": [
              "You would use an output parser in LangChain to get more structured information than just text back from language model responses. Output parsers are classes that help structure language model responses by implementing methods to format and parse the output into a desired structure. This can be useful in cases where you need specific data structures or structured information for further processing.\n"
            ],
            "text/plain": [
              "<IPython.core.display.Markdown object>"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "https://python.langchain.com/docs/modules/model_io/output_parsers/\n"
          ]
        }
      ],
      "source": [
        "from IPython.display import display, Markdown\n",
        "\n",
        "res = qa(\"why would I use an output parser in LangChain?\")\n",
        "display(Markdown(res['answer']))\n",
        "print(res['sources'])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 474
        },
        "id": "trnhMhielMp0",
        "outputId": "3fa9dd12-cf3a-4d86-c1f9-dcfb97361b3f"
      },
      "outputs": [
        {
          "data": {
            "text/markdown": [
              "To use output parsers, you need to follow these steps:\n",
              "\n",
              "1. Choose the output parser that fits your needs, such as PydanticOutputParser, RetryOutputParser, or OutputFixingParser.\n",
              "2. Implement the necessary methods in the output parser, such as `get_format_instructions()` and `parse(str)`.\n",
              "3. Optionally, implement the `parse_with_prompt(str, PromptValue)` method if you need additional information from the prompt to parse the output.\n",
              "4. Use the output parser in your language model code, like in the PromptTemplate or ChatOpenAI.\n",
              "\n",
              "Example usage with PydanticOutputParser:\n",
              "\n",
              "```python\n",
              "from langchain.prompts import PromptTemplate\n",
              "from langchain.llms import OpenAI\n",
              "from langchain.output_parsers import PydanticOutputParser\n",
              "from pydantic import BaseModel, Field, validator\n",
              "\n",
              "class Joke(BaseModel):\n",
              "    setup: str = Field(description=\"question to set up a joke\")\n",
              "    punchline: str = Field(description=\"answer to resolve the joke\")\n",
              "\n",
              "parser = PydanticOutputParser(pydantic_object=Joke)\n",
              "prompt = PromptTemplate(\n",
              "    template=\"Your prompt template here\",\n",
              "    input_variables=[\"your_input_variables\"],\n",
              "    partial_variables={\"format_instructions\": parser.get_format_instructions()}\n",
              ")\n",
              "```\n",
              "\n"
            ],
            "text/plain": [
              "<IPython.core.display.Markdown object>"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "- https://python.langchain.com/en/latest/modules/prompts/output_parsers.html\n",
            "- https://python.langchain.com/docs/modules/model_io/output_parsers/\n",
            "- https://python.langchain.com/en/latest/modules/prompts/output_parsers/examples/retry.html\n"
          ]
        }
      ],
      "source": [
        "from IPython.display import display, Markdown\n",
        "\n",
        "res = qa(\"how can I use output parsers?\")\n",
        "display(Markdown(res['answer']))\n",
        "print(res['sources'])"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "6hKiHbUVaVmR"
      },
      "source": [
        "---"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python",
      "version": "3.9.12 (main, Apr  5 2022, 01:52:34) \n[Clang 12.0.0 ]"
    },
    "orig_nbformat": 4,
    "vscode": {
      "interpreter": {
        "hash": "b8e7999f96e1b425e2d542f21b571f5a4be3e97158b0b46ea1b2500df63956ce"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}